[core] Combine initial run fetch, event fetch, and run_started event creation#1569
[core] Combine initial run fetch, event fetch, and run_started event creation#1569VaguelySerious wants to merge 1 commit intomainfrom
Conversation
…creation Signed-off-by: Peter Wielander <mittgfu@gmail.com>
🦋 Changeset detectedLatest commit: 38b6674 The changes in this PR will be included in the next version bump. This PR includes changesets to release 20 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (60 failed)mongodb (3 failed):
redis (2 failed):
turso (55 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
| .from(Schema.events) | ||
| .where(eq(Schema.events.runId, effectiveRunId)) | ||
| .orderBy(Schema.events.eventId); | ||
| allEvents = eventRows.map((e) => EventSchema.parse(compact(e))); |
There was a problem hiding this comment.
| allEvents = eventRows.map((e) => EventSchema.parse(compact(e))); | |
| allEvents = eventRows.map((e) => { | |
| e.eventData ||= e.eventDataJson; | |
| const parsed = EventSchema.parse(compact(e)); | |
| return stripEventDataRefs(parsed, resolveData); | |
| }); |
Preloaded events in run_started path are missing the eventData ||= eventDataJson legacy fallback and stripEventDataRefs call, causing legacy events to lose their data during workflow replay.
TooTallNate
left a comment
There was a problem hiding this comment.
Nice optimization — combining the run fetch, event fetch, and run_started creation into a single round-trip is a clean TTFB win. The overall approach is sound, but there's a correctness bug in the postgres preloaded events path that needs fixing before merge, plus a couple of non-blocking concerns.
Summary of findings:
- Blocking: Missing
eventData ||= eventDataJsonlegacy fallback in postgres preloaded events (see inline comment) - Blocking: Early return for already-running runs in postgres re-fetches the run but doesn't return preloaded events, creating an asymmetry with the local world (see inline comment)
- Non-blocking: The
(result as any).eventDatadelete in postgres is redundant after the spread logic already excludes it — minor style nit - Non-blocking:
world-verceldoesn't surface theeventsfield yet, but since it's optional and the runtime falls back gracefully, this is fine for now
| .from(Schema.events) | ||
| .where(eq(Schema.events.runId, effectiveRunId)) | ||
| .orderBy(Schema.events.eventId); | ||
| allEvents = eventRows.map((e) => EventSchema.parse(compact(e))); |
There was a problem hiding this comment.
Blocking: Missing eventData ||= eventDataJson legacy fallback.
Every other read path in this file (events.get, events.list, events.listByCorrelationId) applies v.eventData ||= v.eventDataJson before parsing. This preloaded events path skips it, which means legacy events that only have data in the payload (jsonb) column — and not in payload_cbor — will silently have eventData: undefined during replay, breaking those workflows.
Suggested fix:
allEvents = eventRows.map((e) => {
e.eventData ||= e.eventDataJson;
return EventSchema.parse(compact(e));
});(The Vercel bot comment also suggests applying stripEventDataRefs here. Whether that's needed depends on whether the runtime expects resolved or raw refs — getAllWorkflowRunEvents uses the default resolveData: 'all' so applying it would be consistent, though the runtime would work either way since it resolves everything.)
| .limit(1); | ||
| if (fullRun) { | ||
| return { run: deserializeRunError(compact(fullRun)) }; | ||
| } |
There was a problem hiding this comment.
Blocking: When the run is already running, this returns { run } with no events field. The runtime will then fall back to getAllWorkflowRunEvents(), which is fine functionally — but it means re-invocations of an already-running workflow get no TTFB benefit from the preloaded events optimization.
More importantly, this early return happens before the event insertion, so it short-circuits the rest of the create method. That means it also skips the requestId idempotency check. If two concurrent invocations race, the first one transitions pending→running and continues, but the second hits this branch and also continues with a valid { run } result. Is that the intended behavior? Normally a duplicate run_started would be guarded by the DB unique constraint on eventId, but here we skip insertion entirely.
If this is intentional (i.e., it's safe for multiple invocations to proceed since replay is deterministic), a comment explaining that would help. If not, this should at minimum return the preloaded events for consistency, or throw an EntityConflictError to signal the duplicate.
| // If already running, return the run directly without | ||
| // creating a duplicate event. | ||
| if (currentRun.status === 'running') { | ||
| return { run: currentRun }; |
There was a problem hiding this comment.
Non-blocking (same concern as postgres): This early return for already-running also returns { run: currentRun } with no event and no events. The runtime will fall back to getAllWorkflowRunEvents() which is fine, but worth a comment explaining why skipping the preloaded events is acceptable here (presumably because this path is a rare race condition).
| specVersion: SPEC_VERSION_CURRENT, | ||
| }, | ||
| { requestId } | ||
| ); |
There was a problem hiding this comment.
Non-blocking observation: Now that runs.get is removed and the runtime always calls events.create(run_started), the behavior changes for runs that are already running. Previously, runs.get would succeed and the if (status === 'pending') guard would skip the run_started creation. Now, every invocation attempts run_started regardless of status.
This works because the world implementations have an early-return for already-running runs — but it's a semantic change worth noting in the PR description. The contract is now: events.create('run_started') must be idempotent for running status (return the run without error), not just for pending → running transitions.
This PR extracts some of the safer performance improvements originally added as part of #1537